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Abstract. This work is devoted to the study of the stationary measures of 
the expansion-modification model. We prove all initial distributions converge 
towards a unique stationary measure exhibiting decay of correlations. We 
also develop an argument towards the proof of an asymptotic scaling behavior 
for the correlation function, which allows us to give a closed expression for 
the scaling exponent as a function of the mutation probability. Finally, we 
prove the validity of the asymptotic scaling behavior and the corresponding 
expression for scaling exponents, for low mutation probability. 



Since the advent of DNA sequencing techniques the problem of unveiling the in- 
formation contained in nucleotide sequences has become a present-day challenge 
for geneticists. A specific issue which has received attention is the problem of 
genome evolution. From the theoretical point of view, having a "good" model for 
the genome evolution is fundamental to the phylogenetic reconstruction, a fast- 
growing field with numerous applications in a broad range of biological areas [7]. 
For this purpose, several models has been introduced (such as n-step Markov chains 
or hidden Markov chains, among others [SJ |H1 EH H] ) to describe the evolution 
of nucleotide sequences as well as the patterns and correlations occurring in the 
genome. In this paper we are concerned with the model proposed by W. Li [3] 
which consists of a sequence (or chain) of symbols that evolve according to a given 
discrete-time stochastic dynamics. Such a dynamics captures the essential pro- 
cesses which are assumed to be responsible of the genome evolution: the random 
expansion and modification of symbols, rules that gave the name to the system 
of expansion-modification model. The latter was originally introduced as a simple 
model exhibiting some spatial scaling properties, a behavior which is ubiquitous 
to several phenomena found in nature [3j- Subsequently this was used to under- 
stand the scaling properties and the long-range correlations found in real DNA 
sequences [21 01 El El HI- Recently the expansion-modification system has also 
been used to investigate the universality of the rank-ordering distributions [TJ 110] . 

The model we will deal with in this paper can be described as follows. Consider 
the random substitution Q 



1. Introduction. 




x with probability p, 
xx with probability 1 — p, 
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in the binary set {0, 1}, and extend it coordinate-wise to the set {0, 1} + of finite 
binary strings. Starting at time zero with a seed in x° € {0, 1} + , and iterating the 
above substitution, we obtain a sequence 

x° M> x 1 i-» • • • n- x" n- ■ • • 

of finite strings of non-decreasing length. Since the applied substitution is a ran- 
dom map, the sequence we obtain by successive iterations is a random sequence 
which is nevertheless supposed to converge, in a certain statistical sense, to a ran- 
dom string x°° . It is easy to see that the the probability of having a finite string 
after infinitely many iterations is zero, it is therefore more convenient to studying 
the evolution of infinite strings under the infinite extension of the above substi- 
tution. In this framework we can rigorously define the asymptotic regime of the 
expansion-modification process and single out some of its salient characteristics. 
In this work we present a mathematical study of the asymptotic regime of the 
expansion-modification dynamics and present a rigorous proof of the scaling be- 
havior of the correlation function, which allows us to determine a closed form for 
the scaling exponent. 

The paper is organized as follows: in Section [2] we set up the mathematical frame- 
work where the expansion-modification system is rigorously defined, then, in Sec- 
tion [3] we prove the existence of a unique stationary distribution towards which 
the dynamics converges. In Section [4] we prove that the unique stationary mea- 
sure exhibits decay of correlations. In Section [5] we prove the asymptotic scaling of 
the correlation function, and we compute a closed expression for the corresponding 
scaling exponent. We finish the paper with some final remarks and comments. 

This work was partially supported by CONACyT-Mexico via the grant No. 129072 
and SEP-Mexico trough the PIFI program. The authors are in dept to G. Salazar 
for his careful reading of the manuscript and the resulting suggestions. 

2. The Expansion-Modification Dynamics. 

2.1. We start with some notations. Let X = {0, 1} N °, endowed X with the o- 
algebra generated by the cylinder sets. Elements of X will be denoted by boldface 
characters like x = xqXi ■ ■ ■ and y = yoyi • • • , where Xi, iji £ {0, 1}. Finite se- 
quences of symbols, also called words, will be also denoted by boldfaced letters 
while their size will be denoted by | • |, i.e., for v £ {0, l} k we have |v| = k. 
A word v £ {0, l} k occurs as prefix of x £ X, which we denote by v C x, if 
v — xqXi ■ ■ -Xk-1- We will also use this notation when x £ X is replaced by a 
finite word. Given a configuration x 6 X and integers < i < j, with x^ we 
denote the word XiXi+i ■ • • Xj occurring in x. Product of words will be understood 
as concatenation: given two words v £ {0, l} fe and w £ {0, l} 1 we will denote the 
word u of size k + I, with prefix v and suffix w, by vw; that is Uq _1 = v and 
u^ + ' _1 = w. Consider S = {e,m} N °, where the symbols e and m stand for ex- 
pansion and modification respectively. The space S, which we will refer to as the 
space of substitutions, is endowed with the cr-algebra generated by the cylinder sets 
as well. We will use the same convention to denote the elements of S, words and 
concatenation of words, as for the symbolic space X. 
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2.2. Let us now define the local substitutions e, m : {0, 1} —¥ {0, 1} + := U^ 1 {0, 1}", 
which are given by 

e(x) — xx, 
m(x) — x. 

A sequence s e S of local substitutions defines the global substitution s : X — > X 
by means of the rule 

s(x) = J| Sjfai). 
ieN 

Here f| stands for concatenation of words. Notice that s replaces the z-th symbol 
of x according to the i-th local substitution, i.e, if s$ = e then xi is expanded, 
otherwise Xi is modified. 

2.3. The expansion-modification dynamics is a random dynamical system whose 
orbits depend on an initial condition and a choice of global substitutions to be 
applied to that initial condition. To be more precise, an initial condition x € X 
and a sequence s°, s 1 , s 2 , . . . of configuration in S, define the orbit x°, x 1 , x 2 , . . . in 
X with x° := x and for each t > 0, x* +1 = s*(x*). The choice of the sequences 
of global substitutions to be applied is determined by a probability measure on S. 
At each time step we randomly choose a configuration s e S according to that 
measure, and then we apply the corresponding global substitution. The measure 
according to which we select the sequences of substitutions is taken from the family 
of Bernoulli measures {y p : p £ [0, 1]} defined as follows: for a given p e [0, 1], v p is 
the product measure such that v p \ra\ = p and v p \e] = 1—p, i.e., v p corresponds to a 
random sequence of modifications and expansions which are selected independently 
and uniformly, with probability p for the first and 1 — p for the second. Below we 
will refer to p as the mutation probability. The probability of events involving a 
finite number of coordinates depends only on the measures of cylinder sets in S 
and in X. We say that the system has an asymptotic behavior if regardless of the 
measures describing the initial distribution, the time-i distribution converges in the 
weak* sense as t goes to infinity, i.e., the probability of all events involving a finite 
number of coordinates converges as the time goes to infinity. 

2.4. Let us assume that the time-f configurations, x*, are distributed according 
to the measure /U* on X. The distribution /i t+1 of time-(t + 1) configurations is 
completely determined by v p and /i* according to the following expression: 

(1) M *+ 1 {(x t + 1 )^ = a}= £ £ M *{(x*)g = b}^{(B*)S = c}, 

ce(c.m}'+ 1 be{o,i)'+i 

for each f e No and a € {0, 1} . As mentioned before, a IZ b means that the 
word a occurs as a suffix of the word b. Hence, the evolution of the length-^ 
marginal is nothing but a Markov chain. Indeed, considering the length-^ marginal 
of a measure /i as a probability vector of dimension 2 +1 , the length-^ marginal 
/i^, of the time-f distribution is given by matrix product n\ = $ M\, where Mi : 
{0, l} e+1 x {0, l} e+1 -> [0, 1] is the 2 £+1 x 2 £+1 -stochastic matrix given by 

M,(a,b)= ]T 

ce(c.m}'+ 1 be{o,i)'+i 
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3. The asymptotic distribution. 

Let us assume for the moment that for each I 6 N , the stochastic matrix Mi is 
primitive. Then, the Perron-Frobenius Theorem ensures the existence of a unique 
probability vector fii : {0, l} e+1 — > (0, 1] such that 



Hi = HiMi and lim \J\M\ = /i^, 



t— f oo 



for each probability vector ^ : {0, l} e+1 — > [0,1]. Hence, for any measure /i° 
specifying the distribution of the initial conditions, for each f e No, and for all 
a e {0, l} e+1 we have 



(2) Um/i [a]=ft(a). 



If in addition the probability vectors (i£ satisfy the compatibility condition 



(3) E = 

xe{o,i} 



for each £ € No and a G {0, then Kolmogorov's Consistency Theorem implies 

the existence of a measure /i on X such that //[a] = /z^ (a) for each f € No and 
a G {0, Finally, Equation (J2j) ensures the convergence of /z* towards [i in the 
weak* sense. 



The primitivity of Mi is a consequence of the following argument. As we prove 
in Appendix |A[ for each pair of words a, b g {0, there exists a sequence of 
substitutions such that applied to a produces a word having b as prefix. Now, since 
all words in {e, m}^ +1 have positive probability, then the previous claim implies that 
M?(a, b) > for some n > 0, which proves that Mf is irreducible. Now, since the 
world 00 • • • occurs as the prefix of e(0)e(0) • • • e(0), then M^(00 • • • 0, 00 • • • 0) > 0, 
which implies that Mi is aperiodic, therefore Mi is primitive. 



Now, the compatibility condition Q is inherited from the analogous compatibility 
condition satisfied by the marginals fi\ at each time i £ N . Indeed, for t = we 
obviously have 



£ /i? +1 (ar) ■■= E **Vl = ^° U M ) = M[a] =: P?(a), 

x£{0,l} a;£{0,l} Va:e{0,l} 
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for each I € No and a £ {0, 1} +1 . Here U stands for the disjoint union. Now, from 
Equation it follows that 



E /O**) = E E 

x£{0,l} x6{0,l} sg{e,m}'+ 2 



E M*M 



. be{o,i) f + 2 



E "pMm* 

s£{c,m} f + 2 



U U [t 

ze{0,l} be{o,i}*+ 2 , 



E "pMm* 

s£{c,m}'+ 2 



U M 



■ be f o,i}'+ 2 



Since |a| = £ + 1, the statement a C Ili=o s »(^) * s equivalent to a C Ili=o s i(^)' 
and we have 



E ^+i( aiC ) = E ^N^ 1 

cG{0,1} se{c : m}«+2 



E 

se{e,m}«+ 1 



•a 



be{o,i}' 



E E 

pe{c,m} 



■ be{o,i}'+i 

\-En| =0 .((N) / 



/ 



E 

s£ {c,m}^ 



E M*[b] 



be{o,i} f + 1 

\»En| =0 h(K) 



( M *M^) (a) 



/ 



for each l£ No and a £ {0, The compatibility condition Q follows by taking 
the limit t — > oo on both sides of the equation. 

The above arguments prove of the following: 

Theorem 1. For each p € (0, 1) there exists a unique measure \i p on X which is 
invariant under the expansion-modification dynamics. Furthermore, starting from 
any measure fx determining the distribution of the initial conditions, the measure 
[i l , corresponding to the distribution at time t, converges in the weak* sense to jjL p . 



The above discussion applies to any random global substitution defined by a finite 
collection of substitutions on a finite alphabet. The existence of an asymptotic be- 
havior depends only on the fact that the stochastic matrices governing the evolution 
of the marginals are primitive. 
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4. Decay of correlations. 
The two-sites correlation function, C p : N — > K, is given by 

C p (n) := y x x„cfyi p (x) - ^ x <fyt p (x)^ ^ x„cfyi p (x)^ , 

where x„ denotes, as usual, the projection of x on the n-th coordinate. Since the 
expansion-modification dynamics is invariant under the flip x„ <R- x^ for each n e 
N , then the stationary measure // p has to satisfy ^ p {x„ = 1} = ^ p {x„ = 0} = 1/2 
for all n € No- Therefore 

C p (n) := / x x„ dfj, p (x) - 1/4 = ^ P {x = x„ = 1} - 1/4. 
J x 

We will prove that [i p has decay of correlations, i.e., that lim„_ ) . 00 C p (n) = 0, for 
each p e (0, 1). 

Since fi p is flip-invariant, then /z p {x = x„ = 1} = ^i p {x = x„ = 0} and /i p {x = 
7^ x„ = 1} = ^ p {x = 1 / x„ = 0}, for each n € No, hence 

Cp(«) : = ^ (Mp{ X = x„} - 1/2) = | (/i p {x = x„} - A*p{x 7^ x n}) • 
Now, since ^ p is invariant under the expansion-modification dynamics, then 

n 

^ p {x = x„} = ^ M P {x = x k }is p {s = s k , ^(sq) =n + l} + 

k=\n/2\ 
n 

X] M P { x o = Xk}v P {s = s k = e, ^(sq) = n + 2} + 

fe=rn/2] 
n 

Y ^p{ x ° ^ x fc }j/ p {s ^ s k , £(sq) = n + 1} + 

fc=[n/2] 
n 

X! M P { x o 7^ x fc }i/ p {s 7^ s fc = e, £(sq) = n + 2}, 

fc=rn/2] 

and similarly for ^ p {x = x n }. Here and below we denote by £(sq) the length of 
the words obtained by applying the substitution Sq- From the previous equation 
and its analogous for ^ p {x = x„}, it follows that 

n 

(4) C p {n)= C p {k){f( P )is p (k,n)+g( P )v p {k,n-l) + h{ P )is p (k,n-2)), 

k=[_n/2\ 

where 



f(p) 

g{p) 
Hp) 

and for each k, n e N 



= P(2p-1), 

= (l-p)(l-3p), 



(5) v p {k,n):=v p {l{ S \- 1 )=n-l} = Q _ (* " *>)"" V*" "" J 
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From this point on we will use the notation S p (n) :— Ylk=ln/2~] v p(k,n). It follows, 
from a straightforward computation, that S p (n + 1) = pS p (n) + (1 — p)S p (n — 1). 
This recursion, starting from S p (0) — and S p (l) = 1, gives 

1 - (v- IV" 

(6) S p (n)= & ) 

z p 

Now, since for < p < 1/3 we have f(j>) < < g(p), h(p), then, by taking absolute 
values on both sides of Q we obtain 

\C p (n)\< max \C p (k)\ (-/(p) S p (n) + g(p) S p (n - 1) + h(p) S p (n - 2)) . 

n/2<k<n 

From this and ^ we obtain 

(7) K(n)\< max \C p (k)\ ((1 - 2p) + 2p{\ - p) n ) . 

n/2<k<n 

For 1/3 <p< 1/2 we have f(p),g(p) < < h(p), therefore 

i - ] 

'p(3-4p) 2(1 -p) n (l -p-p 2 ) 



C p (n)\ < max \C p (k)\ max (-f(p) S p (n) - g(p) S p (n - 1) + h(p) S p (n - 2)) 

n/2<k<n 



(9) < max |C p (fc)| 



(8) < max \CJk)\ , 

Finally, for 1/2 < p < 1 we have g(p) < < f(p), h(p), and then 

\C p {n)\ < max(-g(p)S p (n-l) + f(j))S p (n) + h(p)S p (n-2)) 

p | 2p(2p-l)(l -p)" > 

n/2<i<n r!JV "' \2 — P 2 — 

All of the inequalities ([7]),([8| and ([9]) have the form 

\C p (n)\ < max |C p (fc)| (a(p) + e p {n)) , 

n/2<k<n 

with a(p) £ (0, 1) and linin^oo e p (n) — 0. Taking the limsup in both sides of the 
inequality we obtain 

limsup |C p (n)| < a{p) limsup max |C p (fc)| = a{p) limsup |C p (n)| , 

n— >oo n— >oo n/2<k<n n— >oo 

which implies lim„_i. 00 C p (n) = 0. 

In this way we have proved the following: 

Theorem 2. For each p £ (0,1), the stationary measure \x p has decay of correla- 
tions. 
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5. Scaling behavior. 



5.1. The two-sites correlation function follows an asymptotic scaling law with ex- 
ponent varying with the mutation probability. As shown by the following heuristic 
argument, the scaling exponent varies with the mutation probability in a piece-wise 
smooth way. 

In the previous section we proved that 

n 

C p( u ) = ^2 C p (k)(f(p)v p (k,n)+g(p)v p (k,n-l) + h(p)v p (k,n-2)). 

fc=|n/2j 

The distribution k i— > v p {k, n), which is unimodal with maximum at k « n/{2 — p), 
steepens around this maximum as n goes to infinity in such a way that 

n 

k=\n/2\ l(n)<k<u{n) 

where £(n) < n/(2 — p) < u(n) are such that (2 — p)i(n) /n, (2 — p)u(n)/n — >• 1 as 
n — > oo. Hence, assuming a slow variation in k i— > C p {k), we have 

C p (n) fa ^2 Cp(k)(f(p)u p (k ) n)+g{p)Up(k,n-l) + h(p)up(k,n-2)) 

£{n)<k<u(n) 



~ C ) (f(p)v P ( k > n )+9(p)vp( k , n - 1 ) + h {p)vp( k , n - 

^ ^ ' l(n)<k<u(n) 

~ c (yz-) J2 (/(pK(fc,") + #(pK(M-i) + %K(M-2)) 

= C (j^j (/(P) S» + <?(p) S P (n - 1) + h(p) Sp(n - 2)) , 

where S n (p) := X)fc=L„/2j ^p(^ n ) = (1 - (p ~ l)™)/(2 — p), as proved in Seccionji] 
From this we finally obtain the approximate scaling relation 

C r «2-p)^)*(<^ffcM)'c<„ ), 

which traduces into the scaling law C p (n) ps C p (no) (n/no)~" p with 
nn s fl _ log(2 - g) - log(l - 2p) - log(2 - 3p) 

( j log(2-p) 

From Q we readily obtain the recurrence relation 

Efc=|„/2i C p (k){f{p)v p {k,n)+g[p)v p {k,n-l) + h{p)v v {k,n-2)) 
C " (n+1) = l+p"-H(l-2p) 

which we use to numerically compute the two-point correlation function for dif- 
ferent values of the mutation probability. As shown in Figure [l] the numerical 
computations confirm that the two-point correlation function approximately fol- 
lows a power law behavior. Furthermore, according to Figure [2] the theoretically 
predicted exponents, {— j3 p : < p < 1}, fit very well the ones obtained by linear 
regression from the numerically computed correlation functions. 
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Figure 1. Log-log plot of the two-sites correlation function. A 
power law behavior clearly appears. 
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Figure 2. Exponents obtained by the best power law fit to the 
two-sites correlation function compared to the theoretical asymp- 
totic exponents —f3 p . 



The heuristic argument developed above suggests that the stationary measure /i p 
varies in a piecewise smooth way with p. This variation is reflected on the behavior 
of the two-sites correlation function C p , which appears to follow a power law decay 
which prevails in the whole interval < p < 1, except for the two singularities we 
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find at p = 1/2 andp = 2/3. At precisely those values of p, the two-sites correlation 
function appears to decay faster than any power law. 

5.2. The previous arguments can be refined into the following 

Theorem 3. Let p £ (0, 1/2) U (2/3, 1) and suppose there are constants < a < b 
and no £ N, such that 

(11) n- b < C p (n) < n- a 

for all n > nQ. Then, there exist tuq > no and constants A < 1 < B such that 

An~ p " < C p {n) < Bn-^ 

for all n > toq . 



Proof. Fix p > b and let d(x) := y/p(l — p)(2 — p)(f3 + 1) log(a;)/x. Following a 
standard concentration estimation, which we present in Appendix [B| we prove that 
there exists n^ G N such that 



(12) 



£ u p (k,n)\ <n-^'\ 

\n/k-(2-p)\>d(n) 



for all n > n 2 . 

Let £, u : [1, 00) —> [0, 00) be such that 

f( x ) '■= ^ ttti u(x) := rr~C) 

v ' 2-p + d(x) y ' 2-p-d(x) 

with d as above. Let f p ,g p , h p and v v be as in Section|4j and define 

W p (k,n) := f(p)v p (k,n) + g{p)v p (k, n — 1) + h(p)v p {k,n - 2). 

Since £(n) < k < u{n) => |f - (2 — p) | < d(n), \C p (k)\ < 1 for all k, and since 
+ \g(p)\ + < 1 f° r au P G (0,1), then, using Q and taking into ac- 

count (12 1, we obtain 

(13) C p (n) < J2 C p (k)Wp(k,n)+n- b 6 n 

t(n)<k<u(n) 

(14) C p {n) > C p (k)Wp(k,n)-n,- b 6 n 

i(n)<k<u(n) 

for each n > ri2- 

In Appendix [C| we prove that there exists m G N such that W p (k,n) > in the 
interval i(n) < k < u(n) for all p £ (0, 1/2) U (2/3, 1) and all n > n x . For those 
values of p we can define a probability distribution k t— > V p (k 1 n) proportional to 
k 1 y W p (k,n), in the interval l{n) < k < u(n). 



Assuming (11) we can rewrite Inequalities (|13[) and 14 



C P {n) < [ W P {n,k) + 25 n ]E p<n (C p ), 

,fe=L«/2j 



C P {n) > [ W P {n,k)-25 n ]E p>n (C p ), 

,fe=L«/2j 
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where E Pi „(C p ) denotes the mean value of C p with respect to P p (k,n). We have 
proved in Section |I| that S p (ri) := J2k=[n/2l v p(^i n ) = (2 — — (p ~ 1)"): 

therefore J2k=[n/2] W p (fc, n) = (1 — 2p)(2 — 3p)/(2 — p) — 2p(p — 1)™ and we obtain 

/ (l-2p)(2-3p) \ 

c-pW < I ^zr i p> n ( 

+ 3S n ) max C„(k), 

) e(n)<k<u(n) 



(1 


-2p)(2- 


3p) 




2-p 




(1 


-2p)(2- 


3p) 




2-p 




(1 


-2p)(2- 


3p) 




2-p 




(1 


-2p)(2- 


3p) 


2-p 



C P ([A>]) > m p ,= min C p ([*]), 



C„(n) > — ^-3<5„jE p ,„(C p ) 

> ( (l -f {2 - 3p) + 3S n ) min CW*), 

\ 2 — p J i(n)<k<u(n) 

for all n > max(ni, 712). To easy the notations, let 

X p := (2— p), <^>(x) :~£(X p x)/Xp and ^(a;) :=w(A p x)/A p . 
With this we can rewrite the previous inequality as 

a/ p "' u - min C p ([y]) < C p ([X p x]) < A^* +,x " max C p ([y]), 

A p 0(a:)<j/<Api/)(2;) \ p <j>(x)<y<A p il>(x) 

with jfe. := 3Xp 0p {x - l)-^- 6 )/ 2 / log(A p ) > 3A P ^ 5 [a:] / log(A p ). It follows from 
this, by straightforward induction, that 

C P ([A>]) < m p A ^ <A * "> max C p ([*]), 

X p (p k (X p x)<z<X p ^ 2 (X p x) 
X p (p k ( Ap - 1 x)<2< Xpijj 2 (X P 1 x) 

for all fc € N and x > max(ni, 77,2). 

To finish the proof we use the following inequalities which we prove in Appendix |D| 
There exists xq > e such that for all x > xq there are constants < Q(x) < 1 < 
R(x) such that for every 1 < j < k € N we have 

Q(x) X k p-3 x < Ap^(A fc-1 x) < ^ (Ap -1 x) < R(x) X*- j x. 

Furthermore, Q(x), R(x) 1 when x — ¥ 00. 

Using the above estimates and taking into account Hypothesis (11), we obtain, for 
all p € (0, 1/2) U (2/3, 1) and all x > m := max(i , n , n 1? n 2 ), the inequalities 

-fc/Sp+HiZn rj ..k-j 

C p ([X k p x}) < X p J - QWA " "(Q^a:)- 
< A- fc ^+ Q ^ x ) (Q{x)xy a , 

~k j3p S-j — n ^^1/ 1 \ & — 1 t 

C P ([A>]) > A p ^ QMAp ^(i?(x)x)- h 
> A-^*-^ (i?(x)x)" b , 

where 

, , 3x(2A p )^ 2 



(Q(x)x) ( ^ b)/2 A^ log(A p )(A p /3 
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From these inequalities it follows that 

((3/2)"^ R(x)~ b x "- b \- a ^ n-A> < C p (n) < (Q(x)~ a ^"""A" n"' 3 ", 
and the proof is finished. 

□ 

In Appendix |E| we prove that for p < 1/5 there are constants < a < b and n G N, 
such that 

rT h < C p (n) < n~ a 

for all n > hq. In this way we establish the asymptotic scaling behavior for small 
mutation probabilities. 

6. Conclusions. 

The expansion-modification system we have analyzed belongs to a class of stochas- 
tic dynamical systems defined by the action of a global random substitution. The 
existence of a unique stationary measure depends on the primitivity of the stochas- 
tic matrices describing the dynamics of the finite size marginals. The computations 
used to prove decay of correlations of the stationary measure, which rely on Equa- 
tion Q, could be carried on in more general cases where relations similar to this 
one hold. 

The asymptotic scaling behavior of the correlation function, which reflects the self- 
similar behavior (in a stochastic sense) of the system, is expected to take place in 
more general cases as well. It is important here to mention the work by Messer 
and co-authors [TT], where a model generalizing Li's is studied, and the work of 
Mansilla and Cocho [9] where the correlation function of Li's model is studied. 
In the former the authors deduce an asymptotic scaling behavior from a closed 
expression for the correlations function and in the latter the authors obtain upper 
and lower bounds for a "dynamical" exponent of the correlation function. The 
present work follows similar ideas but in the framework of a rigorous study of 
the expansion-modification systems. We were able to prove the existence and 
uniqueness of the stationary measure, which attracts all initial distributions as 
times goes to infinity (Theorem [I]) . We also proved that this stationary measure 
exhibits decay of correlations (Theorem [2]) . We studied the scaling behavior of the 



correlation function and we deduced an expression (Equation (10)) for the scaling 
exponent as a function of the mutation probability. We rigorously stablished the 
validity of that expression in case of low mutation probabilities (Theorem [3]) . 

The scaling behavior of the correlation function implies a scaling behavior for the 
so called power spectrum f(co) :— \J 7 (C p )(uj)\, where ^(Cp) denotes the discrete 
Fourier transform of the correlation function. A straightforward computation shows 
that f{cu) = O{oj- a p) with 

_ log(2 -p)- log(l - 2p) - log(2 - 3p) 
ap - L Pp ~ log(2-p) 

It is worth to mention here the work by M. Zakz |16j where the power spectrum 
of systems similar to Li's is studied. There, a scaling law is deduced from ap- 
proximative recurrence relations for the Fourier transform of a realization of the 
system. 
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In a recent paper the expansion-modification system has been studied in relation 
to the universality of the rank-ordering distributions pQ . In that work the authors 
numerically found an order-disorder transition which would manifest itself on the 
scaling behavior. According to them, there would be a critical mutation probability 
p c ~ 0.4, such that for p > p c , long-range correlations and consequently the scaling 
behavior of C p would disappear. As we have shown, this kind of order-disorder 
transition does not really occur. The apparent drop of long-range correlations for 
large p can be explained by the lack of statistics. Indeed, a huge amount of data is 
needed to empirically compute correlation functions with fast power law decay. In 
order to observe a power law decay with exponent —5 up to two decades, we would 
need of the order of 10 10 sample sequences in {0, l} 10 . These sample sequences 
should be generated by the action of the expansion-dynamics, over an arbitrary 
seed, after a sufficiently long transient. That explains why the scaling behavior of 
the C p is very difficult to observe from empirical computations for p > 0.4, where 
P p > 5 (see Figure [2]). 
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Appendix A. 

Claim 1. For each a, b G {0, there exists {s°, s 1 , . . . , s"} C {c, m}+ 7 such 

that b C s" o • • • o s 1 o s°(a). 

Proof. For I € No and a e {0, 1} £+1 , let s = e i+1 whenever ao = 0, otherwise let 
s = me £ . Clearly, for each n > \\og(£ + l)/log(2)] we have 

e+1 Ct"o-..ot 1 os(a), 

where t J <G {e, m} + is such that e l+1 C t J for each 1 < j < n. We will prove that 
for each b e {0, there exists {s°, s 1 , . . . , s fc } C {e, m}+ such that 

b C S fe o • • • o s 1 o s° (0^ +1 ) , 

which readily implies the claim. 

Since C e(0) and 1 C m(0), the claim holds for I = 0. Assuming the claim for 
/ = / - 1 we have, for all c G {0, a sequence {s°, s 1 , . . . , s k } C {e, m} + such 
that c[ C s fc o • • • o s 1 o s° (0 Z ) . If k is even and Co = 1 or k is odd and Co = 0, by 
taking V := ms- 7 for < j < fc, we have 

c = c c™ C m fc+1 (0) s fc o • • • o s 1 o s° (0' +1 ) = t fc o • • • o t 1 o t° (0 i+1 ) . 

On the other hand, if k is even and c = or k is odd and Co = 1, by taking 
t j+i ._ ms j j or < j < k, and t° := c' +1 , we have 

c = c c5™ C m fc+2 (0) s k o • • • o s 1 o s° (0 i+1 ) = t fc+1 o . . . o t 1 o t° (0 i+1 ) , 

and the claim follows. □ 



Appendix B. 

We start with the following. 

Lemma 1. For each p e (0, 1) the function 

q i ^ I p (q) := q/(q + 1) log - p)) + (1 - + 1) log ((1 - ? )/p) 

is non negative, strictly convex, and satisfies 

mm{I p (q):qe(0,l)} = I p (l-p) = Q. 

Proof. Since 14 - log(x) is a concave function, then 

Ip(Q) = -^--(-qlog(^)-(l-q)\o^ 



On the other hand, 

In this way we prove that I p is non-negative with minimum at q = 1 — p. Now, 



^ ,(i-, 2 ) + (^iFl 21o H^ ) + l ^ ))>0 



EXPANSION-MODIFICATION SYSTEM 



15 



for all p,q € (0, 1). For this, note that if 1 — q > p then 1 — p > q and in this case 
we have 

(fI M> 1 >0 , 

dq 2 q(l — q 2 ) 

otherwise, for 1— q < p then 1— p < q, and taking into account that — log(a;) > 1— x, 
we obtain 

d 2 I p (q) _ l_ 2 (({l-qf\ (1-p) 

dq 2 q{l-q 2 ) [q + lf ° & 

> 12 / /(l-y)-'N (J. 



<z(i-? 2 ) (g + i) 3 V V p J q 
> 1 (J_. 2 {1 - q)q ]= 1 + 3q2 >o 

" q(q + l){l-q (q + l) 2 ) q(q + l) 3 (l-q) 
Therfore I p is strictly convex. □ 



Let < /3 < b be given, and let d(n) be defined as in Subsection 5.2 We have the 
following: 

Claim 2. For each p £ (0, 1), there exists n 2 € N such that 

5 n :=n b l Yl MhnUKn-V-W 

\\n/k-(2-p)\>d(n) J 

for all n > ri2- 

Proof. A very useful refinement of Stirling's approximation, first published in |15j . 
states that 

\Z2nnn n exp ( — nH | < n\ < \f2irn n n exp ( — n -\ | 

V 12" + 1 / ~ V l 2n 7 

for all n € N. Hence, for each p £ (0, 1), n > 3, and [n/2] < fc < n, we have 

V/27T fc (n/fc - 1) (2 -n/fc) / k \ 

6XP ( " e - fc) " *,*(„-*)-<-*> (2fc-n)-(»-n) X U - J " (+£n ' fc) ' 

with e„ fe = (4 min(n — k, 2k — ri)) -1 . A simple computation shows that 

k k (n - k)-^-^ (2k - n)-( 2fc -") (1 - p )">-k p 2k-n = exp ^ _ k j n y^ ^ 

with / p (g) := q/(q + 1) log ( ? /(l - p)) + (1 - ff )/(g + 1) log ((1 - q)/p) for each 
q E (0, 1). Hence 

e -n7p(n/fc-l)-£„ ifc e -nJ p (n/fe-l)+e n , h 

(15) = <vjk + l,n+l)< . 

v / 2irk(n/k-l)(2-n/k) ~ P y/2n k (n/k - 1)(2 - n/k) 

for each n > 3 and [n/2] < k < n. On the other hand, 

v p (n/2 + l,n + l) = (1 -p) n/2 = limexp(-n J p (q)), 



f„(n + l,n + l) = = lim cxp(— n I p (q)), 

g->0 



hence, by using 



f cxp(±l/(4 min(n-fc,2fc-n))) j f ^ fc 

1 otherwise, 



16 



R. SALGADO-GARCIA AND E. UGALDE 



we can extend (151 to 

(16) e -n/,(n/*-l) A - < Vp (k + l,n + l) < e -« I P (n/k-l) A + 

which holds for all n > 3 and [n/2] < k < n. A simple computation shows that 



A + < max 1, 



- 2) ' ^471- (n - 2) 



= 1, 



for n > 3. 



We have already proved that the function q i-> ip(g) is non negative, vanishes only 
at 3 = 1 — p, and is strictly convex. Hence, Taylor's Theorem ensures that 



(q — (1 — p)) 2 x min 



k-(i-p)l<e dq 2 



< I q {q) < {q - (1 - p)f x max 



|g_(i_ p )|<e dg 2 



Since d 2 I p jdq 2 \ q= \- p — (p(l —p)(2 —p))) 1 and q i-> d 2 I p /dq 2 is continuous, then, 
for all a € (0, 1) there exists e Q > such that 

a(q-(l-p)) 2 < 7 a -l( g -(l- p) )2 



p(l-p)(2-p)) 



p(l-p)(2-p)) 



for all q £ (1 — p — e Q , 1 — p + With this, and using ( 16 ), we obtain 

< ^ p (fc + 1, n + 1) < n x exp(— nl p (e)) < n x exp f — — 

|fc/n-(2-p)|>e w "' 



p(l-p)(2-p) 



for all e < e Q . By taking n such that d(n) := \/p(l — p)(2 — p)(/3 + 1) log(n) Jn < e a 
we obtain 

0< v p (k + l,n + l) < n l ~ a{!3+1 \ 

\k/n-(2-p)\>d(n) 

and the claim follows with a — (b + f3 + 2)/ (2/3 + 2) and n 2 such that d(n) < e a for 
all n > ri2. □ 



Appendix C. 

Claim 3. For eachp e (0, 1/2) U (2/3, 1) t/iere exists ni g N suc/i thatW p {k,n) > 
ot tte interval £(n) < fc < w(n) /or a/Zp € (0, 1/2) U (2/3, 1) and a/Z n > ni. 

Proof. A simple computation shows that 

(\ — n\ n ~ k n 2k ~ n (k — IV 

w *"> " ( ( „- t)l (L-. + i)i W'* 

Q p (k,n) := ((l-2p)(2n-3fc)(2fc-n + l)+p(2n-3jfe-2)(n-fc)) 

Ifp e (0, 1/2) and n is so large that p < l/2-d(n/2), then 3 u(n/(2 -p)) < 2n-3 
and in this case Q p (fc, n) > for all t(n/(2 — p)) < k < u(n/ (2 — p)). It is easy to 
check that 

l-p-d(n/2) n-fc ^l-p + d(n/2) 



(p + d(n/2))(n +l) _ 2/c-n + l- p - d(n/2) 
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for £(n/(2 — p)) < k < u(n/(2~ p)). Hence, if p e (1/2,1) and n is so large that 
p> 1/2 + d(n/2) + l/\/n, then 3£(n/(2 — p)) > 2n + y/n, and in this case Q p {k, n) 
have the same sign as 



1 + 



A; 



2p - 1 V 3fc - 2nJ 2k - n + 1 

for all £(n/(2 — p)) < k < u(n/(2 — p)). Here we have two possibilities, either 
p < 2/3 and 

— ( f 1 + 2 V " p + / ( r( 2) - = — < i. 

™^oo2p-lVV »V P + d(n/2) J 2-p 

or p > 2/3 and then 

(Yi+-^i, i- f7y 2) ,-^= i ^<i- 

2p-l\\ V^J (p + d(n/2))(n + l) 



From all this we conclude that the sign of Hp (A;, n) remains constant in the interval 
£(n/(2 -p)) <k< u(n/{2 - p)) for all p e (0, 1) \ {1/2, 2/3} and all sufficiently 
large n. Furthermore, this sign is positive for p e (0, 1/2) U (2/3, 1) and negative in 
the interval (1/2,2/3). □ 



Appendix D. 

Claim 4. There exists xo > e such that, for each x > xq there are constants 
< Q(x) < 1 < R(x) such that for every 1 < j < k € N we have 

Q(x) \ k p - J x < Xp^iX 1 '- 1 x) < ^'(Ap -1 x) < R{x) X k p ~ j x. 

Furthermore, Q(x),R(x) — > 1 w/ien x — > oo. 

Proof. A straightforward computation shows that 

Ap^^" 1 ^ =^(A*s) and A p ^'(A*- 1 x) = u(A*i), 

for all x > e, fc € N and 1 < j < k. It is easily checked that, for each p € (0, 1) 
and (3 > b, there exists x\ > e such that both I and u are increasing functions in 
[xq,oq). 

For each p e (0, 1) and x > x\ let Q(x) be the largest solution to 



It is not difficult to check that Q(x) e (0, 1) and since d(x) — > as x — > oo, then 
Q(x) — > 1 as x — > oo. 

Now, fix k £ N, and x > xi large enough so that X p Q(x) > 1 for all x > x . Let 
Qk,o(x) ■= 1, and define recursively 

Qk,j+i(x) — 



1 + A" (fe ^' )/2 - 1 Q(x)-V2 Vfe - j + 1 d(ar) 
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for < j < k — 1. Clearly 

3-1 _ x 

i>q w w = n ( x + v^ 72 ^^ 172 vfc - « + 1 



i=0 



J'-l 



> exp -^^VV^TTlA-^)/ 2 - 1 



Since a; > e > A p , then > A^x, and therefore 



d(\; x ) 



k , p(l-p)(2-p)(/3 + l)log(X*x) 



AS 



x 



< ^/y p(1 - p){2 - m3 + 1)(fc + 1} losW = 



Hence, for j = 1 we have 

^(a£ i) = ,*V k X s n > — 3x % ; T = Qk ^ X p^ x - 

p l + d{X k p x)/X p i + A p fe/2 1 ^/k^ld{x) P 

Suppose that £ j (X p x) > Qk,j(x) X k ~ 3 x for j < k. Since Qk,j(x) X p ~ 3 x < X p ~ j x < 
x k-j+i faen 



d(Q k ,j X k p ~ 3 x) 



fpjl -p)(2- P )(13 + 1) log(Q fcj -(a;) x) 



3 P V A{- j 'Q w (i)i 

< X-^-^QkA^Wk + 1 d(x) 

< X-^-WQixyWy/k -j + l d(x). 

Taking into account that y i->- £(y) is an increasing function for y > xo, and since 
Qk,j(x) Xp~° x > Q(x) x > x , then 

v + \ x » > ; Q yl x)x \ k % 



> 



X p + d(Q k j{x) X p 3 x) 

Q 3 {x) X^-i^x 



1 + Xp"- 3 ' 112 - 1 Q(x)-i/2 y/k-j + ldfr) 
= Qk,j+i{x)X k p - j - 1 x. 
In this way we have proved that 

P (Xp x) > Q k ,j(x) X k p ~ j x > Q(x) X k p - j x, 
for all k <G N and 1 < j < k, and x > x - 
By taking R(x) the smallest solution to 



d(x) x -> m + l 



\VR(x)x p ^ y a p 
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the previous argument can be easily adapted to deduce 

u j {\*x) < R{x)\ k - ] x, 
for all k G N and 1 < j < k, and x > x . □ 



Appendix E. 

Claim 5. For each p £ (0, 1/10] there are constants < a < b and no G N 7 such 
that 

n- b < C p (n) < n- a 

for all n > hq. In this way we establish the asymptotic scaling behavior in the 
interval (0, 1/10]. 

Proof. Equation Q can be rewritten as 

1 n— 1 

(17) C » = l +D »fl-2 P ) ^ C p (k)W p (k,n), 

^ 1 11 k=[n/2] 

with W p (k,n) as Appendix [C| Since p < 1/2, the, as we proved in Appendix [Cj 
W p (k, n) > whenever 3fc < 2n — 3. On the other hand, g h-> ip(g) is monotonously 
decreasing in [0, 1— p] and monotonously decreasing in [1— p, 1]. Therefore, following 
the computations of Appendix [B] we obtain 

C p (n) < i — f max CJk)) V WUfc.n) 

PV ^ ~ l+P™(l-2p) \>/2]<fc<2rj/3 PW / ^ PV ; 



fe<2n/3 



i If^l E ^,n) + 5 e -<»-«',(l-ft)], 

1 V W V fe =[»/2] ' / 

for all n > 4. Now, since 1/2 — 3/2n < 3/4 < 1—p, and q i-> i p (g) is monotonously 
decreasing in [0, 1 — p], then 



2 2nJ py ' ' 3 ° v 4p(l-p), 

for each n > 4 and p G (0,1/10]. With this, and taking into account that 
£ fe W p (fc, n) = (1 - 2p)(2 - 3p)/(2 - p) - 2p(p - 1)™, we obtain 

for all p < 1/10 and n > 4. Hence, for N > 4, taking into account that C p (k) < 1/4 
for all fc G N, we have 

C„(AT + m) < i ( 1 ~ 2 P)( 2 ~ 3 P) exp ( 7 (TV)) for each < m < N, 
4 2 — p 

2 



C P (2A0 < l ^ 1 ^ 3P) ) exp( 7p (A0 +7 (2A0), 

7 P W := 



where 

1 ({2 - p)n(4p(l -p)) (n ~ 1)/3 2p(2 - p)(l -p)"' 

Ar<™ a 2W-i 1 +p n (l - 2p) I 3(1 - 2p)(2 - 3p) h (1 - 2p)(2 - 3p) 
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A simple recursion on these inequalities leads to 

1 /(l-2p)(2-3p) 



C p (2 K N + m) < 



P 



for all k G N and each < m < 2 K AT. Since J2T=o >( 2 ^ Ar ) < 00 for a11 P e I ' *]> 
then we can find, for e > 0, an integer JV > 4 such that 

(Ig) C p (n) < n -((kg(2-p)-log(l-2p)-]og(2-3p))/log(2)+e) 

for each n > N. 

The proof of the lower bound reduces to a recursion as well, but in order to establish 



the seed of the recursion, we will need a bit of brute force. Using (17 1 we explicitly 
compute the correlation function C p (n), for n small [^] It can be checked that all of 
the functions p H >• C p (n) are positive, monotonously decreasing and log concave in 
[0, 1/10]. By log concave we mean that 

log(C p (n)) > (1 - 10p) log(C (n)) + 10plog(C 1/5 (n)) 



log(C (n)) + 10plog 



Ci/ioN 
C (n) 



= -log(4) + 10plog(4C 1/10 (n)), 

which holds for all 1 < n < 25 andp € [0, 1/10]. Here we are using that Cb(n) = 1/4 
for all neN. From the log concavity we can easily find power law bounding C p (n) 
from below. Indeed, by taking 



b p := 0.60206 + 5.62823 p > max 

1 12<n<25 



log(4)-10plog(4C 1/10 (n)) 
log(n) 



we have C p (n) > n' b ^ for all 12 < n < 25 and p G [0, 1/10]. 

As we showed above, I p (1/2 — 3/ (2n)) > — log (4p(l — p)) /3 for each n > 4 and 
p G (0, 1/10], therefore 



W p {k,n)C p (k) 

k>2n/3 



< 



1 n(4p(l -p)) ( ™~ 1)/3 



for each n > 4 and p G [0, 1/10]. Taking into account the power law bound obtained 
by direct computations, C p (n) > n~ b ^ p \ and using ITT]), we have 



C p (n) > 



> 



> 



(E k <2n/ 3 Wp(k,n)) (min k<2n/3 C p (k)) - s (4p(l - p)) ( "- 1)/3 
1 +p n + 1 (l - 2p) 

s fc <n w P (fc,»)) (f y bp - § (4 P (i - P )) ( - l)/3 

1 + p"+ 1 (l - 2p) 

(4 P (1-P)) ( - 1)/3 J /? n 
3 



(l-2p)(2-3p) 
2-p 



2p(p - l) n - 2 b ?- 2 ( r j) 1+bp 



1 +p"+ 1 (l - 2p) 



We used Maxima to compute C p (n) for 1 < n < 25. 
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for all n such that 25 < n and 2n/3 < 25, i.e., 25 < n < 37. It can be checked that 
( (1 - 2 1 ( ;- 3P) - 2p(p- 1)» - 2 b - 2 (f ) 1+bp (4p(l -p)) ( - 1)/3 ) / 3 , 



l+p«+i(l-2p) V2 
decreases with p and increases with n. Since ai/i (25) ~ 1.0999111 > 1, it follows 
that C p (n) > rT hp for all 12 < n < 37 and p e [0,1/10]. From here, a standard 
induction implies that C p (n) > n~ bp for all n > 12 and p € [0, 1/10], and the claim 
follows. □ 

We could, in the previous claim, enlarge the interval of mutation probabilities in 
which upper and lower power law bounds for the correlation function can be found. 
To this aim, and following the same scheme of proof, it would be necessary to 
explicitly compute C p (n) for larger values of n. By doing so we could in principle 
replace [0, 1/10] to [0,p*), where p* := sup{p e (0, 1) : C p (n) > OVn e N} « 0.28. 
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